Skip to content

Conversation

@ldorau
Copy link
Contributor

@ldorau ldorau commented Oct 8, 2025

No description provided.

@ldorau ldorau force-pushed the Add_support_for_zeCommandListAppendLaunchKernelWithArguments branch from 816fda9 to f2b4eb5 Compare October 9, 2025 21:13
@ldorau ldorau force-pushed the Add_support_for_zeCommandListAppendLaunchKernelWithArguments branch from f2b4eb5 to a766e0e Compare October 9, 2025 21:14
@ldorau ldorau force-pushed the Add_support_for_zeCommandListAppendLaunchKernelWithArguments branch from a766e0e to 2e469a3 Compare October 9, 2025 21:31
@ldorau ldorau force-pushed the Add_support_for_zeCommandListAppendLaunchKernelWithArguments branch 2 times, most recently from 896f37c to f10396e Compare October 10, 2025 19:43
@ldorau
Copy link
Contributor Author

ldorau commented Oct 10, 2025

Rebased

@ldorau ldorau requested a review from slawekptak October 10, 2025 20:09
@ldorau ldorau force-pushed the Add_support_for_zeCommandListAppendLaunchKernelWithArguments branch 4 times, most recently from d3db808 to 1c86d5c Compare October 15, 2025 09:04
@ldorau ldorau force-pushed the Add_support_for_zeCommandListAppendLaunchKernelWithArguments branch from 1c86d5c to 8fafee6 Compare October 15, 2025 09:54
@ldorau ldorau changed the title [UR][SYCL] Add support for zeCommandListAppendLaunchKernelWithArguments() [UR][SYCL] Add support for zeCommandListAppendLaunchKernelWithArguments() Oct 15, 2025
@ldorau ldorau force-pushed the Add_support_for_zeCommandListAppendLaunchKernelWithArguments branch from 4d5e639 to 1007ca3 Compare October 29, 2025 09:09
@ldorau ldorau force-pushed the Add_support_for_zeCommandListAppendLaunchKernelWithArguments branch from 1007ca3 to 467faad Compare October 29, 2025 11:22
…ts()

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
@ldorau
Copy link
Contributor Author

ldorau commented Oct 29, 2025

@intel/dpcpp-nativecpu-reviewers @intel/llvm-reviewers-cuda @intel/llvm-reviewers-runtime @intel/unified-runtime-reviewers-opencl please review

@ldorau
Copy link
Contributor Author

ldorau commented Oct 29, 2025

@steffenlarsen Could you review this PR on behalf of @intel/dpcpp-nativecpu-reviewers ?

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I was just granted these powers yesterday. NativeCPU changes LGTM!

@github-actions
Copy link
Contributor

@intel/llvm-gatekeepers please consider merging

@ldorau
Copy link
Contributor Author

ldorau commented Oct 30, 2025

@intel/llvm-gatekeepers please merge

@ldorau
Copy link
Contributor Author

ldorau commented Oct 30, 2025

Apologies, I was just granted these powers yesterday. NativeCPU changes LGTM!

Thank you! :-)

@kswiecicki kswiecicki merged commit 5df5f45 into intel:sycl Oct 30, 2025
49 of 50 checks passed
@ldorau ldorau deleted the Add_support_for_zeCommandListAppendLaunchKernelWithArguments branch October 30, 2025 10:16
const std::optional<int> &ImplicitLocalArg =
DeviceKernelInfo.getImplicitLocalArgPos();
std::optional<int> ImplicitLocalArg =
ProgramManager::getInstance().kernelImplicitLocalArgPos(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduced perf regression that I am trying to fix in #20621.
@ldorau did you make it intentionally? If yes, could you please clarify the reason?

Copy link
Contributor Author

@ldorau ldorau Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is a part of the 49e4eb2 commit that is a revert of revert of #18764.

I made it by accident during this revert, because it was present on the sycl branch at this time: 24f54ea#diff-654cc430ad3aa564d94299199b9b53ce9cb48b2301edfa8890d6bd509dd23e55L2456-L2458

So it is a mistake made during the revert of a very huge commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by #20621 - thanks @vinser52 for spotting and fixing this issue!

}

// just a performance optimization - avoid heap allocations
static thread_local std::vector<ur_exp_kernel_arg_properties_t> UrArgs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line introduce another perf regression. Since we are using TLS here there are a lot of calls to the __tls_get_addr on each call to this function. Actually on each access to the UrArgs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally there was:

  std::vector<ur_exp_kernel_arg_properties_t> UrArgs;
  UrArgs.reserve(Args.size());

24f54ea#diff-654cc430ad3aa564d94299199b9b53ce9cb48b2301edfa8890d6bd509dd23e55R2432-R2433

We have tried to fix it: #20316 (comment) but we have no other idea either...

aelovikov-intel pushed a commit that referenced this pull request Nov 12, 2025
…er (#20621)

The performance regression was introduced in the #20316.
We should get the position of the implicit local arg from the
deviceKernelInfo, not from the ProgramManager.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants